Introducing spatial data

Spatial data types

  • Vector data
  • represents space using points, lines and polygons (“features”)
  • each of these can have additional data (“attributes”)
  • R package sf

  • Raster data
  • represents space using cells of equal size
  • each cell can have additional data (“layers”)
  • R package raster

Vector data

Vector data

Raster data

Raster data

Simple features

What is a feature?

  • A feature is a thing… a city, a house, a tree
  • has a geometry describing where it is located
  • has attributes describing its other properties

  • A feature may be made up of other features
  • pixel vs. image

What are simple features?

  • open standard for spatial data types

  • points, lines and polygons = “geometries”

  • geometries are hierarchical
  • set of points is a line
  • set of lines forms a polygon
  • several polygons is a multipolygon

  • 17 geometries, of which we look at 7

  • “data frames with a spatial extension”

sf geometry types

  • A point: st_point()
  • A linestring: st_linestring()
  • A polygon: st_polygon()
  • A multipoint: st_multipoint()
  • A multilinestring: st_multilinestring()
  • A multipolygon: st_multipolygon()
  • A geometry collection: st_geometrycollection()

  • Can be created from vectors, matrices, or lists

Features made of single geometries

# POINT
(pt1 <- st_point(c(5, 2)))

# LINESTRING
(ls1 <- st_linestring(rbind(c(5, 2), c(1, 3), c(3, 4))))

# POLYGON
pl <- list(rbind(c(5, 2), c(1, 3), c(3, 4), c(5, 2)))
(poly1 <- st_polygon(pl))

Features made of multiple geometries

# MULTIPOINT
mpmat <- rbind(c(5, 2), c(1, 3), c(3, 4))
(mp1 <- st_multipoint(mpmat))

# MULTILINESTRING
mll <- list(rbind(c(1, 5), c(4, 4), c(4, 1), c(2, 2)), rbind(c(1, 
    2), c(2, 4)))
(mls1 <- st_multilinestring((mll)))

# MULTIPOLYGON
mpyl <- list(list(rbind(c(1, 5), c(2, 2), c(4, 1), c(1, 5))), 
    list(rbind(c(0, 2), c(1, 2), c(1, 3), c(0, 3), c(0, 2))))
(mpoly1 <- st_multipolygon(mpyl))

Combination of geometries

# GEOMETRYCOLLECTION
gcl <- list(ls1, pt1, mpoly1)
(geomcol <- st_geometrycollection(gcl))

Building up spatial data frames in R

From sfg to sfc

  • Objects we’ve made so far have class sfg
class(pt1)
## [1] "XY"    "POINT" "sfg"
class(mp1)
## [1] "XY"         "MULTIPOINT" "sfg"
  • A simple feature geometry column (sfc) is a list of sfg objects
  • Important because usually we have more than one feature

Combine simple features with st_sfc()

# sfc POINT
pt1 <- st_point(c(5, 2))
pt2 <- st_point(c(1, 3))
(points_sfc <- st_sfc(pt1, pt2))
## Geometry set for 2 features 
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 1 ymin: 2 xmax: 5 ymax: 3
## CRS:            NA

class(points_sfc)
## [1] "sfc_POINT" "sfc"
is.list(points_sfc)  # is this a list?
## [1] TRUE
is.list(mp1)  # is this (multipoint) a list?
## [1] FALSE
st_geometry_type(points_sfc)
## [1] POINT POINT
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT
... TRIANGLE

From sfc to sf

  • sfg and sfc only tell you about locations

  • to these we add non-geographic attributes associated with each feature

  • an sf object is a data frame combines these two things:
  • attributes are stored in a regular data.frame
  • location information (co-ordinates) is stored in one (simple feature geometry) column (sfc) of this data frame

  • Simple features = data frames with spatial attributes stored in a list-column

lnd_point <- st_point(c(0.1, 51.5))                 # sfg object
lnd_geom <- st_sfc(lnd_point, crs = 4326)           # sfc object
lnd_attrib <- data.frame(                           # data.frame object
  name = "London",
  temperature = 25,
  date = as.Date("2017-06-21")
)
lnd_sf <- st_sf(lnd_attrib, geometry = lnd_geom)    # sf object
lnd_sf
## Simple feature collection with 1 feature and 3 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 0.1 ymin: 51.5 xmax: 0.1 ymax: 51.5
## CRS:            EPSG:4326
##     name temperature       date         geometry
## 1 London          25 2017-06-21 POINT (0.1 51.5)

Coordinate reference systems (CRS)

CRSs and projections

  • Longitude and latitude identify locations on the Earth’s surface using two angles



  • Angles depend on the choice of the Earth’s shape
  • The shape is one part of a “coordinate reference system”: the datum

CRSs and projections

  • Lat-Long coordinates aren’t “Cartesian” so Euclidean distances don’t work

  • To put points onto a flat surface we use a projection

  • Projections can’t perfectly reproduce all aspects of the Earth’s surface, because its not a perfect sphere or ellipsoid.

  • Different projections preserve different properties, and are optimized for different regions of the Earth’s surface

  • The projection is another part of the CRS

CRSs in R

st_crs(lnd_sf)
## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCS["WGS 84",
##     DATUM["WGS_1984",
##         SPHEROID["WGS 84",6378137,298.257223563,
##             AUTHORITY["EPSG","7030"]],
##         AUTHORITY["EPSG","6326"]],
##     PRIMEM["Greenwich",0,
##         AUTHORITY["EPSG","8901"]],
##     UNIT["degree",0.0174532925199433,
##         AUTHORITY["EPSG","9122"]],
##     AUTHORITY["EPSG","4326"]]

CRS’s can be specified using an epsg code or a proj4string definition

CRSs in R

  • The epsg code refers to one well-defined coordinate reference system

  • For e.g., “4326” is standard lat-long

  • proj4string has various options and gives more flexibility

CRSs in R

Read in a lat-long dataset

x_ll <- read.csv("data/confirmed-sightings.csv")
head(x_ll)
##              Mountain Latitude Longitude Elevation
## 1 Jargalant Khairkhan 47.74961  92.48772        NA
## 2   Kharkhiraa-Turgen 49.76821  91.83882      1600
## 3   Khasagt Khairkhan 46.77045  96.08086      2389
## 4         Tsambagarav 48.72750  90.70322      2752
## 5         Ikh Bogd NP 44.94847 100.03316      2224
## 6       Altan Khukhii 48.69467  91.68228      2670

CRSs in R

x_ll <- st_as_sf(x_ll, coords = c("Longitude", "Latitude"), crs = 4326)
st_crs(x_ll)
## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCS["WGS 84",
##     DATUM["WGS_1984",
##         SPHEROID["WGS 84",6378137,298.257223563,
##             AUTHORITY["EPSG","7030"]],
##         AUTHORITY["EPSG","6326"]],
##     PRIMEM["Greenwich",0,
##         AUTHORITY["EPSG","8901"]],
##     UNIT["degree",0.0174532925199433,
##         AUTHORITY["EPSG","9122"]],
##     AUTHORITY["EPSG","4326"]]

CRSs in R

d_ll <- st_distance(x_ll[1, ], x_ll[2, ])
d_ll
## Units: [m]
##          [,1]
## [1,] 229489.4

Reprojecting

x_utm <- st_transform(x_ll, 
                      crs = "+proj=utm +zone=48 +datum=WGS84 +units=m +no_defs")
st_crs(x_utm)
## Coordinate Reference System:
## User input: +proj=utm +zone=48 +datum=WGS84 +units=m
+no_defs
## wkt:
## PROJCS["UTM Zone 48, Northern Hemisphere",
## GEOGCS["WGS 84",
## DATUM["WGS_1984",
## SPHEROID["WGS 84",6378137,298.257223563,
## AUTHORITY["EPSG","7030"]],
## AUTHORITY["EPSG","6326"]],
## PRIMEM["Greenwich",0,
## AUTHORITY["EPSG","8901"]],
## UNIT["degree",0.0174532925199433,
## AUTHORITY["EPSG","9122"]],
## AUTHORITY["EPSG","4326"]],
## PROJECTION["Transverse_Mercator"],
## PARAMETER["latitude_of_origin",0],
## PARAMETER["central_meridian",105],
## PARAMETER["scale_factor",0.9996],
## PARAMETER["false_easting",500000],
## PARAMETER["false_northing",0],
## UNIT["Meter",1]]

Reprojecting

d_utm <- st_distance(x_utm[1, ], x_utm[2, ])
c(d_ll, d_utm)
## Units: [m]
## [1] 229489.4 231902.2

Reading in spatial data

From a shape file

occupancies <- st_read("data/Mongolia_SL.shp", quiet = TRUE)
head(occupancies)
## Simple feature collection with 6 features and 2 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 26817.32 ymin: 5744980 xmax: 106817.3 ymax:
5804980
## CRS: 32648
## Team Occ geometry
## 1 A 0.20980194 POLYGON ((66817.32 5784980,...
## 2 A 0.28545357 POLYGON ((86817.32 5784980,...
## 3 A 0.25328456 POLYGON ((66817.32 5764980,...
## 4 A 0.14932811 POLYGON ((86817.32 5764980,...
## 5 A 0.28437589 POLYGON ((26817.32 5744980,...
## 6 A 0.09210202 POLYGON ((46817.32 5744980,...

plot(st_geometry(occupancies))

plot(occupancies["Team"])

plot(occupancies["Occ"])

occupancies %>% st_crs()  # check the CRS
## Coordinate Reference System:
##   User input: 32648 
##   wkt:
## PROJCS["WGS 84 / UTM zone 48N",
##     GEOGCS["WGS 84",
##         DATUM["WGS_1984",
##             SPHEROID["WGS 84",6378137,298.257223563,
##                 AUTHORITY["EPSG","7030"]],
##             AUTHORITY["EPSG","6326"]],
##         PRIMEM["Greenwich",0,
##             AUTHORITY["EPSG","8901"]],
##         UNIT["degree",0.0174532925199433,
##             AUTHORITY["EPSG","9122"]],
##         AUTHORITY["EPSG","4326"]],
##     PROJECTION["Transverse_Mercator"],
##     PARAMETER["latitude_of_origin",0],
##     PARAMETER["central_meridian",105],
##     PARAMETER["scale_factor",0.9996],
##     PARAMETER["false_easting",500000],
##     PARAMETER["false_northing",0],
##     UNIT["metre",1,
##         AUTHORITY["EPSG","9001"]],
##     AXIS["Easting",EAST],
##     AXIS["Northing",NORTH],
##     AUTHORITY["EPSG","32648"]]
occupancies <- st_read("data/Mongolia_SL.shp", quiet = TRUE) %>% 
    st_set_crs("+proj=utm +zone=48 +datum=WGS84 +units=m +no_defs")
occupancies %>% st_crs()
## Coordinate Reference System:
## User input: +proj=utm +zone=48 +datum=WGS84 +units=m
+no_defs
## wkt:
## PROJCS["UTM Zone 48, Northern Hemisphere",
## GEOGCS["WGS 84",
## DATUM["WGS_1984",
## SPHEROID["WGS 84",6378137,298.257223563,
## AUTHORITY["EPSG","7030"]],
## AUTHORITY["EPSG","6326"]],
## PRIMEM["Greenwich",0,
## AUTHORITY["EPSG","8901"]],
## UNIT["degree",0.0174532925199433,
## AUTHORITY["EPSG","9122"]],
## AUTHORITY["EPSG","4326"]],
## PROJECTION["Transverse_Mercator"],
## PARAMETER["latitude_of_origin",0],
## PARAMETER["central_meridian",105],
## PARAMETER["scale_factor",0.9996],
## PARAMETER["false_easting",500000],
## PARAMETER["false_northing",0],
## UNIT["Meter",1]]

From a spreadsheet or non-spatial data frame

sights <- read.csv("data/confirmed-sightings.csv")
sights <- st_as_sf(sights, coords = c("Longitude", "Latitude"), 
    crs = 4326)
sights %>% st_crs()
## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCS["WGS 84",
##     DATUM["WGS_1984",
##         SPHEROID["WGS 84",6378137,298.257223563,
##             AUTHORITY["EPSG","7030"]],
##         AUTHORITY["EPSG","6326"]],
##     PRIMEM["Greenwich",0,
##         AUTHORITY["EPSG","8901"]],
##     UNIT["degree",0.0174532925199433,
##         AUTHORITY["EPSG","9122"]],
##     AUTHORITY["EPSG","4326"]]
# reprojecting
my_crs <- "+proj=utm +zone=48 +datum=WGS84 +units=m +no_defs"
sights <-  st_transform(sights, crs = my_crs)

plot(sights["Elevation"])

Raster from file

raster_filepath <- system.file("raster/srtm.tif", package = "spDataLarge")
new_raster <- raster(raster_filepath)
plot(new_raster)

Raster from data frame

mydata <- expand.grid(x = 1:100, y = 1:100) %>% 
  mutate(z1 = x + y, z2 = x - y)
myraster <- rasterFromXYZ(mydata)
plot(myraster[["z2"]])

Raster from scratch

raster1 <- raster(nrows = 10, ncols = 10, res = 1, xmn = 0, xmx = 10, 
    ymn = 0, ymx = 10, vals = 1:100)
plot(raster1)

Multiple layer rasters

raster2 <- raster(nrows = 10, ncols = 10, res = 1, xmn = 0, xmx = 10, 
    ymn = 0, ymx = 10, vals = runif(100))
myrb <- brick(raster1, raster2)
plot(myrb)

Manipulating spatial data

Filter

dplyr verbs work with sf objects in intuitive ways

occ_gt90 <- occupancies %>% filter(Occ > 0.6)
plot(occ_gt90)

Select polygons with a sighting

Some specialised spatial functions like st_intersects

sights_per_cell <- st_intersects(x = occupancies, y = sights)
sel_logical <- lengths(sights_per_cell) > 0
occ_with_sights <- occupancies[sel_logical, ]
plot(occ_with_sights["Occ"])

Grouped summarize

occupancies %>% group_by(Team) %>% summarize(meanOcc = mean(Occ)) %>% 
    head()
## Simple feature collection with 6 features and 2 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: -773182.7 ymin: 4924980 xmax: 426827.5 ymax:
5804980
## CRS: +proj=utm +zone=48 +datum=WGS84 +units=m +no_defs
## # A tibble: 6 x 3
## Team meanOcc geometry
## <fct> <dbl> <GEOMETRY [m]>
## 1 A 0.235 MULTIPOLYGON (((-53182.68 5484980, -53182.68
5504980, -33182.68…
## 2 B 0.473 MULTIPOLYGON (((-413182.7 5424980, -413182.7
5444980, -393182.7…
## 3 C 0.330 POLYGON ((-553182.7 5344980, -553182.7
5364980, -573182.7 53649…
## 4 D 0.407 MULTIPOLYGON (((-253182.7 5324980, -253182.7
5344980, -253182.7…
## 5 E 0.599 MULTIPOLYGON (((-473182.7 5104980, -473182.7
5124980, -453182.7…
## 6 F 0.318 MULTIPOLYGON (((386819.9 5144974, 386819.9
5164970, 366827.5 51…

Can also summarize geometries

occ_per_team <- occupancies %>% group_by(Team) %>%
  summarize(meanOcc = mean(Occ),
            geometry = st_union(geometry))
plot(occ_per_team["meanOcc"])

Static maps

tmap

  • similar syntax to ggplot2, tailored to maps
  • shape is a spatial object with class sf, sp or raster.
map0 <- tm_shape(occupancies) + tm_fill()
map1 <- tm_shape(occupancies) + tm_borders()
tmap_arrange(map0, map1, nrow = 1)

map2 <- tm_shape(occupancies) + tm_polygons()
map3 <- tm_shape(occupancies) + tm_polygons("Occ")
tmap_arrange(map2, map3, nrow = 1)

  • Multiple layers per shape
tm_shape(occupancies) + tm_polygons("Occ") + tm_text("Team", 
    size = 0.4)

  • Multiple shapes
tm_shape(occupancies) + tm_polygons("Occ") + tm_shape(sights) + 
    tm_dots("Elevation", size = 0.5)
Color settings are especially important for maps. Here use of the same palette obscures detail.

Color settings are especially important for maps. Here use of the same palette obscures detail.

  • Sequential palettes are good for values that have natural ordering from low to high
tm_shape(occupancies) + tm_polygons("Occ") + tm_shape(sights) + 
    tm_dots("Elevation", size = 0.5, palette = "BuGn")

  • Categorical palettes are best for data with no natural ordering
tm_shape(occupancies) + tm_polygons("Occ") + tm_shape(sights) + 
    tm_dots("Mountain", size = 0.5, palette = "Set3")

  • Diverging palettes emphasise differences from some central reference point
tm_shape(occupancies) + tm_polygons("Occ") + tm_shape(sights) + 
    tm_dots("Elevation", size = 0.5, palette = "PRGn")

  • faceting
tm_shape(occupancies) +
  tm_polygons(c("Occ", "Team")) +
  tm_facets(ncol = 2)

ggplot

  • geom_sf plots sf objects
ggplot() + geom_sf(data = occupancies)

  • geom_sf plots sf objects
ggplot() + geom_sf(data = occupancies, aes(fill = Occ))

  • Can plot multiple sf objects
ggplot() + geom_sf(data = occupancies, aes(fill = Occ)) +
  geom_sf(data = sights, colour = "red")

  • scale_XXX_brewer for discrete scales, scale_XXX_distiller for continuous scales
ggplot() + geom_sf(data = occupancies, aes(fill = Occ)) +
  geom_sf(data = sights, aes(colour = Elevation)) +
  scale_colour_distiller(palette = "PRGn")

Interactive maps

package leaflet

  • provides an interface to the Leaflet JavaScript library
  • leaflet() makes a map object that can be piped to other leaflet functions
leaflet() %>% addTiles()

  • Set a starting origin and zoom level
leaflet() %>% addTiles() %>% setView(lng = 94.1, lat = 47.4, 
    zoom = 7)

  • Set a starting origin and zoom level
leaflet() %>% addTiles() %>%
  setView(lng = 94.1, lat = 47.4, zoom = 4)

  • Various base map options with addProviderTiles()
leaflet() %>% addProviderTiles("Esri.WorldStreetMap") %>%
  setView(lng = 94.1, lat = 47.4, zoom = 4)

  • Various base map options with addProviderTiles()
leaflet() %>% addProviderTiles("Esri.WorldImagery") %>%
  setView(lng = 94.1, lat = 47.4, zoom = 4)

  • sf objects can be passed as additional map layers
  • must be in lat/long format
leaflet(data = st_transform(occupancies, crs = 4326)) %>%
  addProviderTiles("Esri.WorldImagery") %>%
  addPolygons()

  • Multiple map layers
leaflet(data = st_transform(occupancies, crs = 4326)) %>%
  addProviderTiles("Esri.WorldImagery") %>%
  addPolygons() %>%
  addMarkers(data = st_transform(sights, crs = 4326))

pal <- colorNumeric("YlOrRd", domain = occupancies$Occ)
leaflet(st_transform(occupancies, crs = 4326)) %>%
  addProviderTiles("Esri.WorldImagery") %>%
  addMarkers(data = st_transform(sights, crs = 4326)) %>%
  addPolygons(color = ~pal(Occ), fillOpacity = 0.4) %>%
  addLegend(pal = pal, values = ~Occ) %>%
  addMiniMap(position = "bottomleft")

Further reading